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Abstract 

Finding pertinent information is not limited to search engines. Onhne communities can amplify the 
influence of a small number of power users for the benefit of all other users. Users' information foraging 
in depth and breadth can be greatly enhanced by choosing suitable leaders. For instance in delicious.com, 
users subscribe to leaders' collection which lead to a deeper and wider reach not achievable with search 
engines. To consolidate such collective search, it is essential to utilize the leadership topology and identify 
influential users. Google's PageRank, as a successful search algorithm in the World Wide Web, turns out 
to be less effective in networks of people. We thus devise an adaptive and parameter-free algorithm, the 
LeaderRank, to quantify user influence. We show that LeaderRank outperforms PageRank in terms of 
ranking effectiveness, as well as robustness against manipulations and noisy data. These results suggest 
that leaders who are aware of their clout may reinforce the development of social networks, and thus the 
power of collective search. 

Author Summary 
Introduction 

Many social networks such as twitter.com and delicious.com allow millions of users to interact, in which 
some members hold much larger influence than the others. Identifying these influential users is not easy. 
Yet it is essential to identify them in social networks: what an online community can collectively achieve 
is to enhance the power of individuals in discovering new information in depth and breadth that no 
individual can even contemplate, and an effective way is to make use of influential users. We take the 
World Wide Web as an example. Though many useful pages are out there, the sheer size of WWW 
creates a great barrier for comprehensive information exploration. Besides search engines which can be 
of great help, there is another mode of information acquisition through leveraging the network power, 
getting useful webpages from different experts. This collective search through social networks [Tl[2] may 
one day complement the current search paradigm which is based on isolated users shooting queries. 

Delicious.com, previously known as del.icio.us, is a representative case that we will focus in this 
paper. Its primary function for individuals is to collect useful bookmarks with tags, such that thousands 
of bookmarks can be easily recalled. But for many users, its new function of networking people is more 
interesting. In delicious.com, users can select other users to be their leaders, in the sense that the 
bookmarks of the leaders are often useful and subscriptions to these bookmarks will be automatic. The 
subscribers, which we call fans, can in turn be the leaders of other users. Out of the 7 million users, which 
is still a rapidly increasing number in delicious.com, about half a million users are linked in a big cluster 
by these leader-fan relations. We call this big cluster the leadership network. Actually this seemingly 
minority group include the most active users. 
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Although this leadership network is highly informative for leader identification, to well utilize the 
network is challenging pW7|. First of all, the leadership structure is complex and going upstream by 
indefinitely climbing up the ladder of leaders is not illuminating. In addition, considering only the 
leaders alone provides no absolute measure of influence, as it is the entire upstream connection which 
act as the information sources and contribute to the influence of a user. Similarly, as we shall see in our 
experiments, merely counting the number of fans is not a good way to quantify the leader significance. 
A sophisticated model however could reveal the intrinsic structure and identify the worthy leaders. 

To well utilize the leadership network we shall devise a method akin to PageRank [SfO] , which effectively 
ranks webpages based on the hyperlink network. However, the leadership network is fundamentally 
different as personal relationships are quickly evolving, which makes adaptability essential for ranking 
users. For instance, the probability which describes the random information acquisition should self- 
adjust when users add or remove leaders. While this probability is governed by an external parameter 
in PageRank, we devise our LeaderRank algorithm where this probability is adaptive and personalized, 
leading to a parameter-free algorithm readily applicable to any type of graph. This advantage eliminates 
the frequent needs of parameter tests and calibration of PageRank on fast evolving networks. Simulations 
show that our LeaderRank algorithm outperforms PageRank in identifying users who lead to quick and 
wide spreading of useful items. Moreover, LeaderRank is more tolerant of noisy data and robust against 
manipulations. 

In addition to ranking, the present study may shed light on the future design of community rules and 
online social networks. Leader identification reinforces well-placed individuals to go deeper and wider in 
information exploration, where the whole society benefits from the collective outputs. A robust ranking 
algorithm also discourages people from manipulations [10]. In this paper, we will compare ranking based 
on the leadership network with simple ranking based on the number of fans. By conducting simulations 
and experiments, we will see how ranking algorithms identify influential users in social networks. Inter- 
ested readers may try the webpage |Ettp : //reink . sesamr . comj where we implement LeaderRank to rank 
users in delicious.com. 

Methods and Materials 

In many online applications, users are able to select other users to be their sources of information. We 
represent these user-user relations by a network with directed links pointing from fans to their leaders. 
The link direction corresponds to votes from fans for their leaders, and popular leaders would have a large 
number of in-links. We take this convention as it matches the direction of random walk in our algorithm, 
though the direction of information flow in the network is opposite, i.e. from leaders to fans. Our aim is 
to rank all the users based on this network topology. 

LeaderRank 

We consider a network of N nodes and M directed links. Nodes correspond to users and links are 
established according to the relations among leaders and fans. To rank the users, we introduce a ground 
node which connects to every user through bidirectional links (see Fig. [1] for an illustration). The network 
thus becomes strongly connected and consists of iV -|- 1 nodes and M + 2N links. To start the ranking 
process, we assign to each node, except for the ground node, one unit of resource which is then evenly 
distributed to the node's neighbors through the directed links. The process continues until steady state 
is attained. Mathematically, this process is equivalent to random walk on the directed network, and is 
described by the stochastic matrix P with elements pij = aij/k°^^ representing the probability that a 
random walker at i goes to j in the next step. Uij = 1 if node i points to j and otherwise, while 
denotes the out-degree, i.e. the number of leaders, of i. This probability flow thus corresponds to the 
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vote from fan i to leader j. Denoting by Si{t) the score of node i at time i, we have 

N+l 

^»(^ + i) = E^^^w- (1) 

The initial scores are given by Si(0) = 1 for all node i (other than the ground node) and Sg(0) = for 
the ground node. 

The presence of the ground node makes P irreducible, as the network is strongly connected. The 
ground node also ensures the co-existence of loops of size 2 and 3 from any node, which implies is 
positive, i.e. all elements of P^ are greater than zero. As P" is positive for some natural number n, the 
non-negative P is primitive. By the Perron-Frobenius theorem, P has the maximum eigenvalue 1 with 
an unique eigenvector. We outline the proof of primitivity and convergence in Supporting Information 
{SI). The score Si{t) for all i thus converges to a unique steady state denoted as Si{tc), where tc is the 
convergence time. At the steady state, we evenly distribute the score of the ground node to all other 
nodes to conserve scores on the nodes of interest. Thus we define the final score of a user to be the 
leadership score S, namely 

5. = ..(ic) + ^, (2) 

where Sg(ic) is the score of the ground node at steady state. Based on the above properties, there are 
several advantages of applying LeaderRank in ranking, which include: (i) parameter-freeness, (ii) wide 
applicability to any type of graph, (iii) convergence to an unique ranking, and (iv) independence of the 
initial conditions. 

To illustrate the ranking process, we provide a simple ranking example in Fig. [TJ After convergence, 
the final scores of the six users are Si = 1.0426, ^2 = 1.1787, 5*3 = 0.9909, ^4 = 0.8929, S'5 = 0.9745 and 
Sq = 0.9205, respectively. Therefore, user 2 is ranked top by the LeaderRank algorithm. 



PageRank 

We briefly describe the PageRank algorithm, with which we compare our ranking results. PageRank 
forms the basis of the Google search engine and represents a random walk on the hyperlink network. A 
parameter c is introduced as the probability for a web surfer to jump to a random website and 1 — c is 
the probability for the web surfer to continue browsing through hyperlinks, c is thus called the return 
probability, i.e. the probability that the web surfer returns and starts a new random walk. In this case, 
Si{t) of a webpage i at time t is given by 



,(t+l) = c+(l-c) 



N 

E 



(3) 



where Sa.t = 1 when a = b and otherwise. The first and second term respectively correspond to the 
contributions from random surfers and from surfers arriving through hyperlinks. 

Before comparing the ranking results, there are several drawbacks in applying PageRank to social 
networks. Firstly, return probability is essential in PageRank 8,9] as algorithmic convergence is only 
guaranteed on strongly connected networks. This introduces a parameter to the algorithm, and results 
in the frequent need of extensive tests on parameter and evaluation metrics, which makes PageRank 
maladaptive to the fast evolving social networks. In addition, return probability is identical for all users 
irrespective of their significance. For dangling users (those without leaders), specific treatments are 
required to distribute all their probability back to the network uniformly [8] . All these drawbacks limit 
the potential of applying PageRank to rank users in social networks, as well as other ranking tasks. 
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Differences between LeaderRank and PageRank 

An obvious difference between LeaderRank and PageRank lies in the formulation, where the ground node 
in LeaderRank plays an important role in regulating probability flows, making LeaderRank parameter- 
free. An essential difference lies in the heart of dynamics, as in LeaderRank the score flow to the 
ground node is inversely proportional to the number of selected leaders, while there is no such relation 
in PageRank. Mathematically, the score flow to the ground node is analogous to the return probability 
in PageRank, and the dependence of score flow on the number of leaders makes LeaderRank adaptive to 
fast evolving networks. The inverse proportion is reasonable, as nodes with a small number of leaders 
receive less information and hence acquire more information from the ground node (which corresponds 
to a larger score flow to the ground node). The same happens on the Internet, as web surfers surfing on 
websites with small out-degree have limited choices of hyperlink and by higher chance jump to another 
random website. More detailed discussions are given in the first section of SI. 

Data description 

We apply the LeaderRank algorithm on the leadership network obtained from the world-largest online 
bookmarking website, delicious.com, to rank users according to their importance. Users in delicious.com 
are allowed to collect URLs as bookmarks, and are encouraged to select a list of leaders as sources of 
information. The dataset we are going to test was collected at May 2008, which consists of 582377 
users and 1686131 directed links. Out of which 571686 users belong to the giant component, while the 
total users in other components are less than 0.1% of the giant component. Actually, the numbers of 
users in the second to fifth largest components are respectively 58, 53, 44 and 35. We thus study only 
the largest component. The number of directed links in the largest component is 1675008, of which 
338756 links (169378 pairs) are reciprocal. If the network is considered as an undirected network, the 
clustering coefficient [TT] and assortativity coefficient [12] are respectively 0.241 and -0.012, while the 
average shortest distance between users is approximately 5.104. 

Results 

We first show the difference among the rankings obtained by LeaderRank, PageRank and the number of 
fans. Table [T] shows the top 20 users ranked by the three approaches. To have a preliminary evaluation of 
these ranking results, we compare the ranks with intrinsic qualities of the users which are independent of 
the ranking algorithm. Specifically, we compare the number of saved bookmarks which may represent the 
activity of users. In particular, the users hlackhelfjones, regine, zephoria and djakes who appear in the top 
20 of LeaderRank but not in PageRank have activity 5925, 6711, 1486 and 5082 respectively, compared 
to the smaller activity 3, 377, 1516 and 242 of the users thetechguy, cffcoach, samoore and kevinrose who 
appear in the top 20 of PageRank but not in LeaderRank. This suggests that LeaderRank outperforms 
PageRank in identifying active users. 

More detailed results are given in SI. For instance, the table of the top 100 users are given in Table. SI 
of SI. We have also examined the relation between scores and ranks for all the approaches, where Zipf's 
laws are observed and shown in Fig. S3 of SI. The overlap among the rankings obtained by LeaderRank, 
PageRank and the number of fans is shown in Fig. S4 of SI. By comparing the relationship between the 
number of leaders and rank (given in Fig. S5 of S'T), we find that PageRank tends to assign high rank to 
nodes with small number of leaders. It is unfair to nodes with large number of leaders, as users with small 
number of leaders are not necessarily influential and manipulators may deliberately remove some leaders 
to improve their rank. In the followings we compare, through simulations and experiments, LeaderRank, 
PageRank and ranking by the number of fans. 



5 



Comparison with Ranking by the Number of Fans 

Ranking algorithms based on the network topology outperform ranking by merely the number of fans. We 
compare again user ranks with intrinsic qualities which are independent of the algorithm. One quantity 
which well characterizes the user influence is the number of times their collected bookmarks have been 
saved by the others. Though the leaders are not the only sources of bookmarks, influential users should 
still lead to wide spreading of their collected bookmarks. We denote the number of bookmarks collected 
by user i to be Bi and the number of times these bookmarks are saved to he Ui. A user who recommends 
only high quality bookmarks should have a large value of Ui/Bi. 

We show in Fig. [2] the number of fans of a user in descending order of his/her rank by Leader Rank. 
The size of the circles is proportional to the value of Ui/Bi. As we can see, there are users who are ranked 
high by LeaderRank but have only a small number of fans. Their ranks would greatly decrease if they 
are ranked by the number of fans. However, users highlighted with the red circles have relatively large 
Ui/Bi which shows that they are indeed high quality users. These users are identified by LeaderRank 
but not by the number of fans. On the contrary, there are users who have low rank but a large number 
of fans. The users highlighted with the blue circles have small Ui/Bi but a large number of fans. They 
are correctly ranked lower by LeaderRank. 

To better understand these users, we draw in Fig. [3] particular examples of users with small number 
of fans but highly ranked, and users with a large number of fans but with a relatively low rank. As we 
can see in Figs. Efa) and (b), users cffcoach and pedersoj are followed by fans with large values of Ui/Bi, 
represented by the large size of circles. Though users kanter and britta have more fans, we can see from 
Figs. [3][c) and (d) that they are surrounded by much smaller circles. LeaderRank correctly gives them a 
lower rank, as compared to the ranking by merely the number of fans. 

Similarly, just the leaders alone provides no absolute measure of influence, as it is the entire upstream 
connection to leaders which act as the information sources and contribute to the influence of a user. We 
show in Fig. S6 of SI that removing all the leaders may have a negative effect on the social influence of a 
user. All these results suggest that the leadership network is much more informative than simple ranking 
criteria such as the number of fans or leaders, and thus algorithms which well utilize the topology can 
provide a better ranking. 

Comparison with PageRank 

In addition to identifying influential users, a good ranking algorithm for social networks should be tol- 
erant of noisy data and robust against manipulations. These goals are better achieved by considering 
the collective ranking based on network topology. In the foUowings we compare the effectiveness and 
robustness between LeaderRank and PageRank, which also utilizes topology in ranking. 

Effectiveness 

How opinions spread and form in a community is an interesting question |131ll4j . To effectively spread 
opinion, one has to identify influential users and create an initial social inertia. For instance, companies 
may choose to start their adverts on influential leaders who are capable to initiate an extensive spreading 
through the Internet or SMS networks. Thus a smart algorithm which ranks influential users accurately is 
of great commercial values. On the other hand, effective ranking algorithm may serve its role to identify 
influential users for immunization and stop epidemic outbreak fl5l. As an example, influential users who 
speed up junk mail spreading can be identified for targeted immunization. Here we show that LeaderRank 
is more capable than PageRank to identify influential users who initiate a quicker and wider spreading. 

Specifically, we employ a variant of the SIR model to examine the spreading influence of the top- 
ranked users |16| . At each step, from every infected individual, one randomly selected fan gets infected 
with probability A, which resembles the direction of information flow. Infected individuals recover with 
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probability at each step, where (fcin) is the average in-degree of all users. To compare the ranking 

effectiveness, we set the initial infected to be the users either appear as the top 20 by LeaderRank 
or PageRank (but not both) in Table [TJ and compare the cumulative number of infected users (which 
includes infected and recovered users), denoted by Ni, as a function of time. The initial infected users by 
the two algorithms are given in the caption of Fig. |4l This experiment resembles an opinion spreading 
initiated from the top users and observe how the opinion propagates. Figure |4l[a) shows that infecting 
the top users from LeaderRank results in a faster growth and a higher saturated number of infected, 
indicating a quicker and wider spreading. To further confirm the effectiveness of LeaderRank, we also 
conduct experiments for the top 50 and top 100 ranked users either from LeaderRank or PageRank and 
obtain similar results which are shown in Figs. lU^b) and (c), respectively. 

We show in Fig. HJd) the quotient of the total infected in LeaderRank divided by that of PageRank, 
with different infection probability A. LeaderRank outperforms PageRank of various return probability 
and for a broad indicated range of A. This reveals again a drawback of PageRank as the optimal return 
probability has to be found by extensive parameter tests. The results imply that spreading from both 
LeaderRank and PageRank users is limited when A is small, but LeaderRank leads to a much wider 
opinion spreading when A is large. For a virus outbreak, if intensive immunizations are implemented on 
the top ranked LeaderRank users, the final outbreak would be less extensive. All the above results show 
that LeaderRank is more effective than PageRank in identifying highly influential users, and is thus a 
better candidate for opinion spreading and to prevent a virus outbreak. 

Tolerance of Noisy Data 

Tolerance of ranking against spurious and missing links, i.e. false positive and false negative connections, 
is crucial when network structure is subject to noisy observations 17[. Social network data may be 
unreliable, especially when users are required to explicitly indicate relationship with others [18 . It is like, 
to state whether neighbors are friends if they just greet each other when they meet. The same happens for 
networks other than social networks but with a rather different cause. For example, protein connections 
obtained from biological experiments often include numerous false positives and false negatives [19' . Other 
than ambiguous personal relationship, it is also costly and technically difficult to explore social networks 
comprehensively. Efforts have thus been made to predict the missing connections 20' and on such noisy 
networks, we should develop ranking algorithms which are tolerant of spurious and missing links. 

To examine the tolerance of LeaderRank and PageRank against noisy data, we measure the change in 
scores and rankings when links are added or removed randomly. These links correspond to the spurious or 
missing relationship among leaders and fans. The scores obtained from the modified graph are compared 
to those from the original graph, by measuring the impact Is on score, as given by 



where Si and S'^ correspond to the scores obtained respectively from the original and modified graph. 
We measure Is for both LeaderRank and PageRank subject to the same modifications. As shown in 
Fig. EJa), Is increases with the number of links added or removed. Remarkably, much smaller values of 
Is are obtained from LeaderRank when compared to PageRank, regardless of the addition or removal of 
links. In a word, LeaderRank is more tolerant than PageRank against noisy topology, and thus has a 
high potential in applications on noisy social networks or protein-protein networks |21j . 

Since a small change in scores in LeaderRank may not directly correspond to a small change in ranking, 
we define a similar measure to examine the impact In on ranking, given by 



N 




(4) 



i=l 



N 




(5) 
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As shown in Fig. E^b), a smaller difference between In of LeaderRank and PageRank is observed as 
compared to Is- Nevertheless lu of LeaderRank is smaller, as shown hy D ^ Ifi^^'^ — ijf^'^^'^ > in 
the inset. Once again, these observations in suggest that LeaderRank is more tolerant of topology 
randomness and hence a better candidate for ranking in noisy networks. 

Robustness against Spammers 

Malicious activities are common in social networks, in particular when users manipulate to gain skewed 
reputation |10) . One example of manipulation is called Sybil Attack |22| . in which spammers deliberately 
create fake entities to obtain disproportionately high rank. The problems become intolerable if this 
manipulation causes recommendation of bad commodities or biased opinion in social networks. In WWW, 
there are also stories of companies manipulating Google search engine to obtain higher ranks in search 
results [33]. To cope with this loophole, we show that LeaderRank is more robust than PageRank against 
this type of attacks. 

Specifically, we simulate the situation where a user creates v fake fans, and compare the ranking 
robustness in LeaderRank and PageRank. The horizontal axis of Figs. EJa) and (b) shows respectively 
for LeaderRank and PageRank the original rank of a user, and the vertical axis shows his/her manipulated 
rank after the addition of v fake fans. Vertical downward shift from the dashed diagonal corresponds to 
the increase in rankings, and thus a successful manipulation. As we can see, LeaderRank is more robust 
against spammers as the change of rankings is much smaller than that by PageRank. These results show 
that LeaderRank is a better candidate for robust rankings against manipulations. 

Experiment 

To let readers better understand social influences as quantified by LeaderRank, we established a webpage 
http://rELnk.sesamr.com which uses LeaderRank to rank users in delicious.com. By providing their 
username, delicious users can easily obtain their rank and other information including the influence of 
leaders and fans. Users can also examine the change of their influence when they have new leaders 
and fans. For instance, the user hahyann519 had a low rank of 607512 before six other users found her 
important bookmarks and added her as a leader. She now has a rank of 99440, a much higher rank which 
shows the increase in her influence. 

Discussion 

After going through the above details, we may conclude that identifying influential users is not a simple 
task. It is not merely answering who is the best, but as well to consider the influences and consequences 
brought by a ranking algorithm. These consequences are of particular importance for social networks, 
which are fundamentally different from networks of webpages. For instance, the ranking should be robust 
against noisy data and smart manipulations. This leads us to answer a much broader question by devising 
a robust and generic algorithm, than merely identifying the leaders. 

We suggest that LeaderRank may serve as a prototype of ranking algorithms applicable to rank users 
in social networks. As personal relationships are quickly evolving, the adaptive and parameter-free nature 
of LeaderRank eliminates the need of frequent calibration. In addition, this simple algorithm outperforms 
PageRank in several important aspects. In this paper, we see that LeaderRank identifles users who lead 
to quick and extensive spreading of opinions. This is important for online applications which feature 
information spreading. On the other hand, LeaderRank is tolerant of spurious and missing links, which 
benefits applications with noisy data, especially personal relationship. To deal with ranking loopholes, 
LeaderRank is robust against manipulations. These results make LeaderRank a good candidate for 
ranking users as well as other ranking tasks. 
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Though LeaderRank is aheady an effective algorithm, extensions may lead to further improvement. 
For instance, the role of the ground node would be more prominent if weights are set on the in- and out- 
links to each node, according to its significance or other criteria. It can also be generalized to applications 
ranging from blog plagiarizer identification |24) . to stopping species lost in ecosystem [25j . These simple 
modifications may lead to substanial improvements in performance. 

Identifying infiuential users in social networks is still a task on which we may overlook. As accompanied 
by the expanding popularity of online communities, leader identification may reinforce their development. 
This further facilitates collective search through online communities and may one day complement the 
current search paradigm. For sure in the near future, technological advance will provide more information 
to quantify user influence, but at the same time will scale up the network size and make ranking tasks 
more challenging. LeaderRank suggested here may serve as a potential candidate to face this challenge 
and well utilize the power of social influences. 
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Figure Legends 




Figure 1. An illustration of the ground node and the LeaderRank algorithm. The social network 
consists of six users and 12 directed links. The final ranking scores are labeled next to the 
corresponding users. 
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Figure 2. The number of fans of a user in descending order of the user rank by LeaderRank. The size 
of the solid circle is proportional to the value of Ui/Bi, i.e. the average number of time their collected 
bookmarks are saved by others. Users highlighted with the red circles have a small number of fans but 
a large value oi Ui/Bi. On the contrary, users highlighted with the blue circles have a large number of 
fans but a small value oi Ui/Bi. 
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Figure 3. Users (a) cjfcoach, (b) pedersoj, (c) kanter and (d) britta, who are ranked respectively at 
29th^ 47"^, gi'^' and 92"^ by LeaderRank, as surrounded by tlreir fans. Tlie size of circles represents the 
average number of times their collected bookmarks are saved by others. 
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Figure 4. The cumulative number of infected users (including recovered users), Ni, as a function of 
time, with initial infected to be the users either appear as (a) top-20, (b) top-50, and (c) top-100 by 
Leader Rank or PageRank (but not both). As we see from Table [1] in the top-20 case, the initial infected 
users by LeaderRank arc blackbeltjones, regina, zephoria and djakes, while that by PageRank are 
thetechguy, cjfcoach, samoore and kevinrose. Infection probability A = 0.5 and return probability is set 
to 0.15 in PageRank. (d) As a function of A, the quotient of the number of infected users in 
LeaderRank divided by that of PageRank, expressed as fractional increase. 




Figure 5. The impact on (a) scores and (b) ranking as a function of number of links added and 
removed. Inset: (b) the difference in ranking mobility between LeaderRank and PageRank. 
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Figure 6. The manipulated rank as obtained by (a) LeaderRank and (b) PageRank, after the addition 
of V fake fans, with v = 10, 50, 100. 



Table 1. Top 20 users ranked by the three approaches. 



User ID 


Ranking 


Leader Rank 


PageRank 


By the number of fans 


adobe 


1 


1 


1 


twit 


2 


2 


2 


wfryer 


3 


6 


3 


willrich 


4 


7 




joshua 


5 


8 


6 










hrheingold 


7 


15 


12 


ewan.mcintosh 


8 


14 


19 


dwarlick 


9 


19 


14 


twitarmy 


10 


3 




merlinmann 


11 


16 












jdehaan 


13 


9 




regine 


14 




9 


Iseymour 


15 


10 




jonhicks 


16 






zephoria 


17 




15 


djakes 


19 






secondlife 


20 


13 




thetechguy 




4 




cfFcoach 




^ 5 




samoore 




18 












stcvcrubcl 






7 


jg walls 






8 


ambermac 






16 


jgates513 








ramitsethi 






18 
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Supporting Information 

1 Primitivity and Convergence 

We first show that the stochastic matrix P is primitive by showing is positive, i.e. the elements in 
P^ are all greater than zero. It is equivalent to show that any pair of nodes are connected in exactly 6 
steps (6 hops). For nodes with at least one link, ground node guarantees the co-existence of loops of size 

2 and 3. Starting at any node with 2 loops of size 2 and a path through the ground node, we can reach 
any other node (excluding the ground node but including itself) in exactly 6 steps. To reach the ground 
node in exactly 6 steps, we make use of one loop of size 3 and one loop of size 2 before hopping to the 
ground node. The same is true to reach the other nodes from the ground node. 

As P is a right stochastic matrix, the transpose would be the usual transition matrix by conven- 
tional matrix multiplication, such that s{tc) = P^sitc)- We then show that 1 is an eigenvalue of P, and 
thus of P^ . The matrix P, which is row- normalized, has obviously an eigenvalue 1 with eigenvector filled 
with all equal entries, and thus 1 is an eigenvalue of P. To show the uniqueness of eigenvector associated 
with eigenvalue 1, we assume that there exists another eigenvector v for eigenvalue 1 with heterogeneous 
entries. Let Vj to be the entry of this eigenvector with \vj \ > \vi \ for all i. We then choose the eigenvector 
such that Vj is positive. As P is primitive, we consider a matrix P™ where all entries are positive. The 
assumption of eigenvector with heterogeneous entries leads to the following contradiction 

t/ = . u ^ = ^ p^w, < p'^^vj = Vj , (SI) 

i i 

where p' denotes the elements of P™. The contradiction implies that for P™, and hence P, the eigen- 
vector with heterogeneous entries does not exist for eigenvalue 1, and thus P^ has an unique eigenvector 
associated with eigenvalue 1, i.e. a unique steady state. 



2 Differences between LeaderRank and PageRank 

The obvious difference between LeaderRank and PageRank lies in the formulation, where the ground node 
in LeaderRank plays an important role in regulating probability flows, making LeaderRank a parameter- 
free algorithm. An essential difference does lie in the heart of dynamics. In LeaderRank, the score flow 
from node i to the ground node is given by 

fi^a ^ j}ont ' (^2) 
while in PageRank the score flow from node i to a random node is given by 

/i— J-rand = CSi(ic)j (^3) 

where c is the return probability. As shown in Fig. ^l] fi^g in LeaderRank is inversely proportional 
to the out-degree of z, i.e. the number of leaders of z, as expected from the above equation. On the 
other hand, /i-nand in PageRank show no obvious trend with the number of leaders. Such observation 
corresponds to a fundamental difference between LeaderRank and PageRank. 

We may interpret the physical reasons in the following examples. In social networks, the score donated 
to the ground node can be interpreted as the information obtained from random browsing, in contrast 
to the ordinary way of information acquisition from leaders. The ground node can thus be considered as 
a centralized leader who provides general infomration. We argue that fans who have a large number of 
leaders may acquire less information from each leader, including this centralized leader, leading to the 
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Fig. SI. The score flow from a node to (a) the ground node m LeaderRank and (b) random nodes in 
PageRank as a function of fcout, the number of leaders. 

relation in Fig. ^ija) . Similar relation is observed in our empirical analyses with delicious data in Fig. ^21 
which show that the ratio of saved bookmarks to the number of leader, decreases with fcout of the user. 
The same deduction can be obtained from the point of view of leaders. If we assume that the average 
number of bookmarks provided by each leader is not indefinitely different, nodes with small number of 
leaders receive only little information from leaders and thus they have to acquire more information from 
the ground node. 

In terms of ranking, users with few leaders should have small voting rights for leaders, otherwise 
they may produce a strong bias if they donate all their score to only one or two leaders. LeaderRank, 
from which a negative correlation is introduced between score flow to leaders and out-degree (i.e. flow to 
leaders is smaller from users with smaller out-degree) , would lead to a better ranking when compared to 
PageRank. 

As the last example, web surfers surfing on websites with small out-degree have limited choices of 
hyperlink and by higher chance jump to another random website. On the contrary, web surfers are more 
likely to go through hyperlinks if there are lots of them on the website. Such cases correspond to a small 
flow from nodes with large fcout to the ground node, which is captured by LeaderRank. 

3 The top- 100 ranked users 

Here we report the top 100 ranked users and their corresponding scores as obtained by LeaderRank, 
PageRank and the number of fans. As one unit of score is initialized on every node in LeaderRank and 
PageRank, the scores sum up to N in these two rankings. The last two columns show the top-100 users 
with the largest number of fans, and their corresponding number of fans. 



Table SI. Top 100 users ranked by LeaderRank, PageRank and the number of fans. 



Rank 


LeaderRank 


PageRank (c=0.15) 


Number of fans 


User ID 


Score 


User ID 


Score 


User ID 


Fans # 


1 


adobe 


452 


adobe 


808 


adobe 


2768 


2 


twit 


382 


twit 


726 


twit 


2422 


3 


wfryer 


369 


twitarmy 


629 


wfryer 


1528 


4 


willrich 


358 


thetechguy 


536 


willrich 


1466 


5 


joshua 


264 


cffcoach 


529 


merlinmann 


1326 
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Rank 


LeaderRank 


PageRank (c=0.15) 


Number of fans 


User ID 


Score 


User ID 


Score 


User ID 


Fans # 


6 


cshirky 


234 


wfryer 


492 


joshua 


1296 


7 


hrheingold 


217 


willrich 


475 


stcvcrubcl 


1284 


8 


ewan.mcintosh 


214 


joshua 


375 


jgwalls 


1142 


9 


dwarlick 


202 


jdehaan 


337 


regine 


1086 


10 


twitarmy 


200 


Iseymour 


334 


jonhicks 


956 


11 


merlinmann 


186 


isola 


315 


kevinrose 


924 


12 


blackbeltjones 


171 


cshirky 


294 


hrheingold 


894 


13 


jdehaan 


170 


secondlife 


291 


cshirky 


837 


14 


regine 


170 


ewan.mcintosh 


288 


dwarlick 


827 


15 


Iseymour 


168 


hrheingold 


285 


zephoria 


812 


16 


jonhicks 


168 


merlinmann 


267 


ambermac 


781 


17 


zephoria 


159 


jonhicks 


262 


jgates513 


702 


18 


isola 


159 


samoore 


261 


ramitsethi 


660 


19 


djakes 


158 


dwarlick 


261 


ewan.mcintosh 


635 


20 


secondlife 


156 


kevinrose 


256 


cory_arcangel 


613 


21 


edtechtalk 


152 


iwantsandy 


249 


secondlife 


587 


22 


stcvcrubcl 


150 


regine 


248 


brightidcasguru 


586 


23 


jgwalls 


142 


jgwalls 


234 


judell 


576 


24 


kevinrose 


135 


steverubel 


222 


warrenellis 


566 


25 


brightidcasguru 


124 


edtechtalk 


214 


edtechtalk 


559 


26 


jgates513 


123 


zephoria 


212 


elisebauer 


545 


27 


cogdog 


120 


nichoson 


210 


blackbeltjones 


541 


28 


joijto 


119 


djakes 


206 


hokic62798 


533 


29 


cffcoach 


114 


blackbeltjones 


206 


djakes 


531 


30 


hokie62798 


113 


elisebauer 


203 


infosthetics 


527 


31 


samoore 


112 


dr. coop 


178 


bibliodyssey 


509 


32 


cityofsound 


112 


sdigrego 


172 


jakkarin 


476 


33 


heyjude 


110 


ambermac 


161 


chrisbrogan 


474 


34 


elisebauer 


108 


ureerat 


160 


russcUdavies 


461 


35 


veen 


104 


jgates513 


160 


makemagazine 


461 


36 


shareski 


102 


glass 


160 


ericerb 


455 


37 


mathowie 


101 


brightidcasguru 


159 


cityofsound 


454 


38 


thetechguy 


101 


ramitsethi 


150 


jummumboy 


435 


39 


judell 


100 


hokie62798 


150 


jdawg 


433 


40 


nichoson 


100 


cogdog 


148 


earlysound 


430 


41 


ambermac 


99 


joi_ito 


146 


jzawodn 


429 


42 


warrenellis 


96 


heyjude 


145 


cogdog 


428 


43 


cory_arcangel 


93 


judell 


143 


mathowie 


421 


44 


jutecht 


92 


cityofsound 


142 


plasticbag 


407 


45 


tome 


92 


kawid 


141 


fredwilson 


407 


46 


choconancy 


92 


ceonyc 


140 


shanselman 


406 


47 


pedersoj 


91 


jdawg 


139 


heyjude 


405 


48 


mamamusings 


91 


bearsgonewild 


136 


leolaporte 


404 


49 


sdigrego 


91 


warrenellis 


136 


joiJto 


385 


50 


linkorama 


90 


benchaporn 


134 


samoore 


384 


51 


plasticbag 


90 


veen 


130 


cursonl2005 


381 


52 


sebpaquet 


88 


shareski 


129 


miyagawa 


364 
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Rank 


LeaderRank 


PageRank (c=0.15) 


Number of fans 


User ID 


Score 


User ID 


Score 


User ID 


Fans # 


53 


ramitsethi 


87 


mathowie 


127 


veen 


363 


54 


snbeach50 


83 


choconancy 


126 


tuckermax 


363 


55 


urccrat 


81 


shanselman 


126 


kanter 


359 


56 


jdawg 


81 


jutecht 


126 


choconancy 


354 


57 


tcach42 


79 


linkorama 


124 


deusx 


351 


58 


jakkarin 


78 


kick_out_the_internet_jams 


123 


aengle 


351 


59 


benchaporn 


78 


cory_arcangel 


123 


lomo 


350 


60 


budthctcachcr 


77 


sclmav 


121 


bren 


344 


61 


infosthetics 


75 


pedersoj 


119 


wearehugh 


342 


62 


jzawodn 


75 


fju_web20 


114 


53os 


342 


63 


raelity 


73 


mamamusings 


113 


lOlcookbooks 


340 


64 


chrisdodo 


72 


tome 


113 


ginatrapani 


336 


65 


fredwilson 


70 


sebpaquet 


111 


angusf 


333 


66 


timo 


70 


bibliodyssey 


111 


zheng 


331 


67 


elemenous 


69 


apluscert 


111 


megsie 


331 


68 


bibliodyssey 


69 


alexdroege 


109 


britta 


327 


69 


iteachdigital 


69 


plasticbag 


109 


benchaporn 


321 


70 


timlauer 


69 


madro 


108 


teach42 


319 


71 


fstutzman 


69 


lisalis 


108 


knowhow 


312 


72 


foe 


69 


fredwilson 


106 


tome 


312 


73 


migurski 


69 


infosthetics 


105 


snbeach50 


307 


74 


russelldavies 


68 


Williams _jeff 


104 


marisaolson 


305 


75 


alexdroege 


67 


lOlcookbooks 


104 


fstutzman 


301 


76 


cursonl2005 


66 


cablack 


104 


edans 


300 


77 


shanselman 


65 


snbeach50 


103 


j asonmcalacanis 


298 


78 


twittcr_cdtcch 


65 


jzawodn 


103 


williams_jeff 


292 


79 


kick_out_the_internet_jams 


64 


wswu 


103 


yugop 


290 


80 


msippey 


63 


daveprol4 


102 


wangl 


290 


81 


qdsouza 


62 


pamanapa 


100 


dhinchcliffe 


288 


82 


anne 


62 


fju_webfund 


100 


ani625 


288 


83 


brasst 


62 


teach42 


99 


music 


287 


84 


acngle 


61 


tarisamatsumoto 


98 


elemenous 


284 


85 


ceonyc 


61 


fju_univintro 


96 


toxi 


282 


86 


kfisch 


61 


russelldavies 


95 


google 


281 


87 


ehubbcU 


60 


makemagazine 


95 


shareski 


278 


88 


makemagazine 


60 


fju_inetcomp 


95 


mbauwens 


275 


89 


lOlcookbooks 


59 


clydekmann 


93 


design 


275 


90 


dr. coop 


58 


atrusty 


92 


mediaeater 


274 


91 


kanter 


58 


budtheteacher 


92 


ehubbell 


271 


92 


britta 


58 


elemenous 


91 


imao 


270 


93 


courosa 


58 


fstutzman 


90 


ureerat_wat 


267 


94 


mguhlin 


57 


twitter.edtech 


90 


ma.la 


265 


95 


marisaolson 


56 


cursonl2005 


90 


alexdroege 


265 


96 


williams_jcff 


56 


timo 


89 


jeweLlee27 


264 


97 


tuckermax 


56 


raelity 


89 


linkorama 


262 


98 


jummumboy 


56 


iteachdigital 


89 


raganwald 


261 


99 


district6 


56 


shiang 


88 


brasst 


261 
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Rank 


LeaderRank 


PageRank (c=0.15) 


Number of fans 


User ID 


Score 


User ID 


Score 


User ID 


Fans # 


100 


chrislehniann 


55 


knowhow 


87 


budtheteacher 


260 



4 Zipf's law 

As shown in Fig. 331 Zipfs law is observed for all the three ranking algorithms. We plot the score of 
each user against his/her rank and observe a power-law decaying. Notice that, although similar relation 
between score and rank is observed among the three algorithms, the ranking of individual is different by 
different algorithms 

5 Comparisons among ranking results from different ranking 
algorithms 

We show in Fig. ^Hthe overlap of ranking between LeaderRank and PageRank, as well as LeaderRank and 
the number of fans. We plot as well the overlap between PageRank and the number of fans for reference. 
These results show that LeaderRank is closer to PageRank, than merely ranking by the number of fans, 
and both LeaderRank and PageRank show positive correlation with the number of fans. Though rankings 
from LeaderRank and PageRank seems to have large overlap, the rankings of individual arc different, as 
can be seen in Table 311 As shown in Fig. 33 average number of leaders of the top users as ranked by 
PageRank is always smaller than that by LeaderRank. It implies that PageRank tends to assign high 
rank to nodes with small number of leaders, which is unfair to nodes with large number of leaders. We 
emphasize again individual rankings are different though the shape of the curves form LeaderRank and 
PageRank looks similar. 

6 Negative effect by removal of leaders 

We show in Fig. 311 that there is a negative effect in the rank of a user by removing all his/her leaders. 
As we can see for both LeaderRank and PageRank, many users are lower in rank after removing their 
leaders. These results suggest that considering just the leaders alone provides no absolute measure of 
influence, as removing the entire upstream connection to leaders user may have a negative effect on the 
social influence of an influential user. In other words, we have to consider the entire upstream topology 
to quantify the social influence of a user. 
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Fig. S3. The score as a function of rank obtained from the LeaderRank, PageRank and ranking by the 
number of fans. Zipf's law is observed for these algorithms. 
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Length of list L 

Fig. S4. The overlap between LeaderRank and PageRank, and LeaderRank and ranking by the 

number of fans, as weh as PageRank and ranking by the number of fans, for the top-L users. 




Fig. S5. The average number of leaders of the top-i users as ranked by LeaderRank and PageRank. 
Inset: the average number of leaders against the logarithm of L. 
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Old Rank Old Rank 

Fig. S6. The rank of a user after removing all his/her leaders, as compared to his/her original rank as 
obtained by (a) LeaderRank and (b) PageRank. The black solid line corresponds to the equality of the 
new and original rank. 



